Empirical analysis of Zipf’s law, power law, and lognormal distributions in medical discharge reports

نویسندگان

چکیده

Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether in medical discharge reports follow Zipf’s law, a commonly assumed property of language where word frequency follows discrete power-law distribution. We examined 20,000 from the MIMIC-III dataset. Methods included splitting into tokens, counting token frequency, fitting distributions data, testing alternative distributions—lognormal, exponential, stretched truncated power-law—provided superior fits data. Discharge are best fit by lognormal distributions. appear be near-Zipfian having provide over pure power-law. Our findings suggest that report would benefit using non-parametric models capture behavior.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mobile Phone Social Networks: Beyond Power-Law and Lognormal Distributions

We analyze a massive social network gathered from a large mobile phone operator’s records, comprised of millions of users and tens of millions of calls. We examine the following questions: what is the distribution of phone calls per customer; total talk time per customer; and distinct partners per customer? We find that these distributions are skewed, and that they significantly deviate from wh...

متن کامل

Power-Law Distributions in Empirical Data

Aaron Clauset, 2 Cosma Rohilla Shalizi, and M. E. J. Newman Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48...

متن کامل

Power - Law Distributions in Empirical Data ∗ Aaron

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty ...

متن کامل

Sampling power-law distributions

Power-law distributions describe many phenomena related to rock fracture. Data collected to measure the parameters of such distributions only represent samples from some underlying population. Without proper consideration of the scale and size limitations of such data, estimates of the population parameters, particularly the exponent D, are likely to be biased. A Monte Carlo simulation of the s...

متن کامل

A Brief History of Generative Models for Power Law and Lognormal Distributions Draft Manuscript

Power law distributions are an increasingly common model for computer science applications; for example, they have been used to describe file size distributions and inand out-degree distributions for the Web and Internet graphs. Recently, the similar lognormal distribution has also been suggested as an appropriate alternative model for file size distributions. In this paper, we briefly survey s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Medical Informatics

سال: 2021

ISSN: ['1386-5056', '1872-8243']

DOI: https://doi.org/10.1016/j.ijmedinf.2020.104324